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Abstract 

In this contribution, models of wireless channels are derived from the maximum entropy principle, for 
several cases where only limited information about the propagation environment is available. First, ana- 
lytical models are derived for the cases where certain parameters (channel energy, average energy, spatial 
correlation matrix) are known deterministically. Frequently, these parameters are unknown (typically 
because the received energy or the spatial correlation varies with the user position), but still known to 
represent meaningful system characteristics. In these cases, analytical channel models are derived by 
assigning entropy-maximizing distributions to these parameters, and marginalizing them out. For the 
MIMO case with spatial correlation, we show that the distribution of the covariance matrices is conve- 
niently handled through its eigenvalues. The entropy-maximizing distribution of the covariance matrix is 
shown to be a Wishart distribution. Furthermore, the corresponding probability density function of the 
channel matrix is shown to be described analytically by a function of the channel Frobenius norm. This 
technique can provide channel models incorporating the effect of shadow fading and spatial correlation 
between antennas without the need to assume explicit values for these parameters. The results are com- 
pared in terms of mutual information to the classical i.i.d. Gaussian model. 

Keywords: Maximum Entropy, Multiple Antennas, Wireless Channel Model, Spatial Correlation. 

1 Introduction 

The problem of modelling the characteristics of a wireless transmission channel is crucial to the appro- 
priate design of suitable channel codes. The recent shift to the multiple antennas, or Multiple-Input 
Multiple-Output (MIMO), paradigm [1] and the corresponding need for MIMO channel models, together 
with the introduction of codes (such as turbo codes [2]) that can operate very close to the channel capac- 
ity, has placed the channel models under scrutiny: initial capacity analyses of MIMO channels assuming 
i.i.d. Rayleigh fading [3] were touting promising spectral efficiencies, whereas the importance of corre- 
lation between channel coefficients [4] and of the channel matrix rank are now understood to be critical 
parameters. In order to facilitate channel code development, analytical channel models are a desirable 
asset. However, most of the available channel models that capture the complex spatial characteristics 
of the propagation channel (geometry, reflection coefficients, . . . ) are based on ray tracing methods or 
variations thereof, which model the channel as a superposition of multipath components [5] and therefore 
do not lend themselves easily to analysis. Conversely, some analytical models were proposed to address 
the problem of accurate space correlation modeling by assuming a Rayleigh fading with appropriately 
designed correlation properties [6] . See [7] for a broad round-up of the literature about wireless channel 
models. 



In [8], Debbah and Muller address the question of channel modeling on the basis of statistical infer- 
ence. Instead of relying on ad-hoc construction - based on intuition - and verification of the models, 
they propose a constructive method based on the constraints that the model needs to meet. The joint 
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probability density function (PDF) of the channel is derived from these constraints, using the maximum 
entropy (MaxEnt) principle, initially introduced by Jaynes [9]. This principle build on the fact that the 
only consistent way of accounting for ignorance when modelling a random process is to maximize the 
entropy of the considered process, subject to all known constraints. In this context, consistent modelling 
is defined as the requirement that independent modellers being given the same set of constraints must 
obtain identical models. This approach is justified on the basis of avoiding the arbitrary introduction 
of information (in the form of model characteristics that represent a reduction of its entropy) and that 
can not be justified by any known constraint. In the case of channel modelling, the constraints represent 
available knowledge about the environment or the channel representation itself (e.g. through bounds on 
amplitude, power...). See [10] for a recent overview of the application of maximum entropy methods to 
inference. 

In [8], the MaxEnt principle is used to derive a joint distribution of the entries of the MIMO channel 
matrix. The popular Gaussian i.i.d. model is shown to be the entropy-maximizing solution under the 
sole assumption that the average Frobcnius norm of the channel matrix is known (known channel power 
constraint). However, this model is admittedly simplistic, in particular because of the following two 
reasons: 

• Measurements have shown that the independence between components, as obtained in [8] and 
proposed in numerous models, rarely holds in reality, and that some degree of correlation between 
the components must be taken into account, 

• Gaussian models constitute good short-term models but their long-term properties are not realistic. 
More precisely, Gaussian models are known to adequately model the effects of rich scattering, but 
to neglect the long-term fading effect captured by the fact that the signal strength (represented by 
the short-term average of the channel Frobenius norm - in the following this quantity is denoted by 
"channel energy") fluctuates. 

The aim of the present article is to extend the general scope of maximum entropy channel modeling, 
by amending existing models to address the aforementioned issues. Both points are addresses using the 
same method: first, a maximum entropy model is derived for the channel, conditioned on the parameter 
of interest (signal strength or spatial correlation). Then, a maximum entropy distribution for is derived 
for the parameter of interest itself, and is later marginalized out to obtain the full channel model. 

This article is structured as follows: first, some notations are introduced in Section 2. In Section 3, a 
maximum entropy model for the channel energy is proposed, based on the knowledge of the average, 
and optionally on an upper bound, of the channel energy. The corresponding channel model is obtained 
by first deriving the distribution of the instantaneous channel realization for a known channel energy, 
and in a second step by marginalizing out the variable representing the energy using the distribution 
established previously. Section 4, focuses on the spatial correlation properties of frequency-flat fading 
channels. Specifically, we address the case where the channel is known to have spatial correlation, but 
the exact characteristics of this correlation are not known. In general, in the absence of knowledge about 
correlation, application of the MaxEnt principle yields a process with independent components (see [8]). 
Therefore, we first focus on the spatial covariance matrix, and derive the MaxEnt distribution of a gen- 
eral covariance matrix, in both the full-rank and rank-deficient cases. In a second step, we construct the 
analytical model for the MIMO channel itself, by first deriving the MaxEnt distribution of the channel 
for a known covariance, and later marginalizing over the covariance matrix, using the distribution of the 
covariance established previously. The obtained distribution is shown to be isotropic, and is described 
analytically as a function of the Frobenius norm of the channel matrix. Finally, Section 5 draws some 
conclusions. 



2 Notations and channel model 

Let us consider the multiple-antenna wireless channel with nt transmit and n r receive antennas. Since we 
are only concerned with non-frequency selective channels, let the complex scalar coefficient hij denote 
the channel attenuation between transmit antenna j and receive antenna i, j — 1 . . .n u i — 1 . . .n r . 
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Let H(i) denote the n r x n t channel matrix at time t. We recall the general model for a time-varying 
flat-fading channel with additive noise 

y(i)=H(t)x(t)+n(t), (1) 

where n(t) is usually modeled as a complex circularly-symmetric Gaussian random variable (r.v.) with 
independent identically distributed (i.i.d.) coefficients. In this article, we focus on the derivation of the 
fading characteristics of H(t). When we are not concerned with the time-related properties of H(i), wc 
will drop the time index t, and refer to the channel realization H or cquivalcntly to its vectorized notation 
h = vcc(H) = \h\ t \ . . . h nrt \,h\2 ■ ■ ■ h nr ,n t ] T ■ Let us also denote N — n r n t and map the antenna indices 
into [1 ... TV], i.e. denoting equivalently h= [hi . . . hjy] T . 



3 Channel energy constraints 

3.1 Average channel energy constraint 

In this section, we briefly recall the results of [8], where an entropy-maximizing probability distribution 
is derived for the case where the average energy of a MIMO channel is known deterministically. It is 
obtained by maximizing the entropy L N — log(P(H))P(H)dH, where dH = YiiLi dRe(/ij)dIm(/ij) is 
the Lebesgue measure on C N (Re(-) and Im(-) denoting respectively the real and imaginary parts of a 
complex number), under the only assumption that the channel has a finite average energy NEo, and the 
normalization constraint associated to the definition of a probability density, i.e. 

[ \\H\\ 2 F P(H)dH. = NEo, and / P(H)dH = 1. (2) 

Jc N Jc N 

This is achieved through the method of Lagrange multipliers, by writing 

L(P)= f - log(P(H))P(H)dH + j3 1- / P(H)dH + 7 NE — [ ||H|||P(H)dH (3) 

Jc N L Jc N J L Jc N 

where we introduce the scalar Lagrange coefficients (3 and 7, and taking the functional derivative [11] 
w.r.t. P equal to zero: 

6 -^p- = log(P(H)) - 1 - (3 - 7 \\m 2 F = 0. (4) 

Eq. (4) yields P(H) = cxp (— ((3 + 1) — 7||H|||,), and the normalization of this distribution according to 
(2) finally yields the coefficients (3 and 7, and the final distribution is obtained as 

WH ) = ? -A_e X p(-|;M) (5) 

Interestingly, the distribution defined by eq. (5) corresponds to a complex Gaussian random variable with 
independently fading coefficients, although neither Gaussianity nor independence were among the initial 
constraints. These properties are the consequence, via the maximum entropy principle, of the ignorance 
by the modeler of any constraint other than the total average energy NE n . 



3.2 Probabilistic average channel energy constraint 

Let us now introduce a new model for situations where the channel model defined in the previous section 
applies locally (in time), but where Eq can not be expected to be constant, e.g. due to shadow fading. 
Therefore, let us replace Eq in eq. (5) by the random quantity E, known only through its probability 
density function (PDF) Pe(E). In this case, the PDF of the channel H can be obtained by marginalizing 
over E: 

P(H)=/ P HiE (n,E)dE= f P H \ E (H)P E {E)dE. (6) 

JR+ JWL+ 

In order to establish the probability distribution P E , let us find the maximum entropy distribution under 
the constraints: 
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• < E < E maxi where E max represents an absolute constraint on the transmit power, or on the 
amplitude range of the receiver, 

• its average Eq = f^"" 1 * EPe{E)&E is known. 

Applying the Lagrange multipliers method again, we introduce the scalar unknowns (3 and 7, and maxi- 
mize the functional 

L{P E ) = - ( ^ log{P E {E))P E {E)dE + (3 [ EP E (E)dE - E + 7 f P E (E)AE - 1 
Jo Jo Jo 

"(7) 

Taking the derivative equal to zero {^pf- = 0) yields P E {E) = exp (j3E - 1 + 7), and the La grange 
multipliers arc finally eliminated by solving the normalization equations 

pEmax p^max 

/ Ecxp(f3E -l + j)dE = E , and / exp ((3E - 1 + 7) dE = 1. (8) 
Jo Jo 

j3 < is the solution to the transcendental equation 

E max exp((3E max ) - ( \ + E J {exp((3E max ) - 1) = 0, (9) 



J 

and finally Pe is obtained as the truncated exponential law 

Pe(E) = 7^-^^ ex P(^)> 0<E<E max , elsewhere. (10) 

Note that taking E max = +00 in cq. (9) yields (3 = — and the exponential law Pe(E) = E exp ^— -jjPj . 
3.2.1 Application to the SISO channel 

In order to illustrate the difference between the two situations presented so far, let us investigate the 
Single-Input Single-Output (SISO) case rit = n r = 1, where the channel is represented by a single complex 
scalar h. Furthermore, since the distribution is circularly symmetric, it is more convenient to consider 
the distribution of r = \h\. After the change of variables h = r(cos9 + isinO), and marginalization over 
6, eq. (5) becomes 



P r (r) = -exp^--j, (11) 

whereas cq. (6) yields 

Pr{r) = I*™* ,„/ r ^ exp (f3E - dE. (12) 

7 exp(f3E max ) - 1 E V E J 

Note that the integral always exists since (3 < 0. Figure 1(a) depicts the probability density functions 
(PDFs) of r under the known energy constraint (cq. (11), with Eq = 1), and the known energy distribution 
constraint (eq. (12) is computed numerically, for E max = 1.5,4 and +00, taking Eq = 1). Figure 1(b) 
depicts the cumulative density function (CDF) of the corresponding instantaneous mutual information 
I(r) = log(l + pr 2 ), for signal-to-noise ratio p = 15 dB. The lowest range of the CDF is of particular 
interest for wireless communications since it represents the probability of a channel outage for a given 
transmission rate. The curves clearly show that the models corresponding to the unknown energy have 
a lower outage capacity that the Gaussian channel model. 



4 Spatial correlation models 

In this section, we shall incorporate several states of knowledge about the spatial correlation character- 
istics of the channel in the framework of maximum entropy modeling. We first study the case where the 
correlation matrix is deterministic, and subsequently extend the result to an unknown covariance matrix. 
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(a) PDF of amplitude r 



(b) CDF of instantaneous mutual information I(r) 



Figure 1: Amplitude and mutual information distributions of the proposed SISO channel models. 



4.1 Deterministic knowledge of the correlation matrix 

In this section, we establish the maximum entropy distribution of H under the assumption that the 
covariance matrix Q = J" C7V hh^P H |q(H)dH is known, where Q is a TV x ./V complex Hermitian matrix. 
Each component of the covariance constraint represents an independent linear constraint of the form 



/ /^P H | Q (H)dH = q a „ 

Jc N 



(13) 



for (a, b) £ [1, . . . , iV] 2 . Note that this constraint makes any previous energy constraint redundant since 
J CN ||H|| 2 ? P H |Q(H)dH = tr(Q). Proceeding along the lines of the method exposed previously, we intro- 
duce N 2 Lagrange coefficients a a ,b, and maximize 



L(Ph\q)= [ -log(PH|Q(H))P H |Q(H)dH + 1-/ 

Jc N Jc 



^ a a ,b 

a£[l,...,N] 
be[l,...,N] 



P H | Q (H)dH 

/ h a h* b p HlQ (a)dii-q a>t 

Jc N 



.(14) 



Denoting A = [a a ,b](a,b)e[i,...,N] 2 t ne N x N matrix of the Lagrange multipliers, the derivative is 

SL(Ph1q) ~ log(P H | Q (H)) - 1 - p - h^Ah* = 0. 



SPi 



H|Q 



(15) 



Therefore, Phiq(H) = cxp (— (/3 + 1) — h T Ah*) , or, after elimination of the Lagrange coefficients through 
proper normalization, 

1 



Phiq(H,Q) 



■exp(-(h ff Q- 1 h)) . 



(16) 



det(TrQ) 

Again, the maximum entropy principle yields a Gaussian distribution, although of course its components 
are not independent anymore. 



4.2 Knowledge of the existence of a correlation matrix 

It was shown in Section 3.1 that in the absence of information on space correlation, maximum entropy 
modeling yields i.i.d. coefficients for the channel matrix, and therefore an identity covariance matrix. 
We now consider the case where covariance is known to be a parameter of interest, but is not known 
deterministically. Again, we will proceed in two steps, first seeking a probability distribution function for 
the covariance matrix Q, and then marginalizing the channel distribution over Q. 
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4.2.1 Correlation matrix PDF 



Let us first establish the distribution of Q, under the energy constraint J* tr(Q)Pq(Q)dQ = NE , by 
maximizing the functional 



HPq) 



log(P Q (Q))P Q (Q)dQ + /? 



Pq(Q)dQ 



+ 7 



J^(Q)P Q (Q)dQ-NE Q 



(17) 



Due to their structure, covariance matrices are restricted to the space S of N x N positive semidefmite 
complex matrices. Therefore, let us perform the variable change to the eigenvalues/eigenvectors space. 
Specifically, let us denote A = diag(Ai . . . Xn) the diagonal matrix containing the eigenvalues of Q, and 
let U be the unitary matrix containing the eigenvectors, such that Q = UAU ff . 

We use the mapping between the space of complex NxN self-adjoint matrices (of which S is a subspace), 



&ndU{N)/T 



p AT 



, where U{N)/T denotes the space of unitary NxN matrices with real, non- negative 



first row, and R< is the space of real N-tuples with non-decreasing components (see [12, Lemma 4.4.6]). 
The positive semidefmite property of the covariance matrices further restricts the components of A to 
non-negative values, and therefore S maps into U{N)/T x R+ 



N 



Let us now define function F over U(N)/T x n^ < ao 

P(U, A) = P Q (UAU H ), U g U{N)/T, A e 
According to this mapping, eq. (17) becomes 



N 



(18) 



L(F) 



U(N)/TxR+ N 



+ P 



+ 7 



U(N)/TxR+ N 



U(N)/Txl 



log(F(U, A))F(U, A)if(A)dUdA 
F(U, A)if(A)dUdA - 1 
( X! A * J F ( U ' A)if (A)dUdA - NEq 



K i=l 



(19) 



_ (2*0 



N(N-l)/2 



where we introduced the corresponding Jacobian K(A) = 

tr(Q) = tr(A) = J2iLi A»- Maximizing the entropy of the distribution Pq by taking SL }P = yields 



rw j'- 



rii< 7 (^ - a.,) 2 , and used 

HE 

SF 



N 



-K(A) - K(A) log(F(U, A)) + (3K(A) + 7 M K ^ = °' 



\i=l 



Since K(A) ^ except on a set of measure zero, this is equivalent to 



P(U,A) = cxp ^-1 + 7 E A ') • 



(20) 



(21) 



Note that the distribution F(XJ,A)K(A) docs not explicitly depend on U. This indicates that U is 
uniformly distributed, with constant density Ptj = (2^)^ over U(N)/T. Therefore, the joint density can 



be factored as F(XJ, A)K(A) — PuPa(A), where the distribution of the eigenvalues over 

=i...jv / 



N 



IS 



^a(A) = ^r-" ex P (7 Xl 



.7=1 



(22) 



At this point, it is worth noting that the form of eq. (22) indicates that the order of the eigenvalues is 
immaterial. In order to see this, consider a pair of eigenvalues (Aj, Xj), with i < j and Xi < Xj, and the 

change of variables (x, y) = ( A y^ J > ■ For am/ mnc ti° n /(A,, Xj), 



I- P+OO r-X 

/ Aj)dAidAj = / / f(x,y)dydx, 

J0<Xi<\j<+co Jx=0 Jy=0 



(23) 
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whereas for the the non-restricted integral 



and therefore 



/ 



r- r + OO r-x 

\ f(X i ,X j )dX i dX j = / f(x,y)dydx. (24) 

J(A;,A 3 )GR+ 2 Jx=0 Jy=-x 

Note that for every function / s.t. f(x,y) = f(x, —y), 

f f(x,y)dy = 2f f(x,y)dy, (25) 
Jy——x 

f(X i ,X j )dX i dX j = 2 [ f{Xi, Xj)dX l dXj. (26) 

Since the probability distribution Pa (A) in (22) obviously verifies the property f(x,y) = f(x,—y) in 
the rotated space for any < i,j < N, this reasoning (generalized to any permutation of the ordered 
eigenvalues) applies to Pa (A). Therefore, for the sake of simplicity we will now work with the PDF 
P A (A) of the joint distribution of the unordered eigenvalues, defined over R +N . Note that its restriction 
to the set of the ordered eigenvalues is proportional to Pa (A) . More precisely, 

VAel +W , P A (A) = ^P A (A s(1) ,...,A s(jV) ) (27) 

where s is any permutation of {1...N} such that A s (i) < A s ( 2 ) < ••■ < ^s(N), and the coefficient 
1/N\ comes from the number of permutations of the N eigenvalues. Since Pa(A s (i), . . . , A s (jv)) = 
P A (Ai, . . . , Ajv), this yields 



P A (A) = Cexp( 7 J2 A«)n( A *-^) 2 ' 

V i=l. ..JV / i<j 



(28) 

<3 

where the value of C — — - — „„_, — — can be determined by solving the normalization equation for the 

Pv N<Uj =1 j'- 
probability distribution P A : 

1=/ P A (A)dA = cj n c7Al Il^-^) 2dA ( 29 ) 

/ \\ n2 r N 

= C I — ) / N Y[e- x *Y[(x l ~x j ) 2 dx 1 ...dx N (30) 

V 7/ Jr+ n i=1 iKi 



j J 11 r(2) V 7/ 

where we used the change of variables Xi — — ^Xi and the Selberg integral (see [13, eq. (17.6.5)]). This 
yields C = {-j) n2 Iln=i[nK« - i)'] -1 - Furthermore, = ^ = NE , and we finally obtain the 

final expression of the eigenvalue distribution 

v u/ n=l v ; \ u i=l...N / »<j 

In order to obtain the final distribution of Q, let us first note that since the order of the eigenvalues has 
been shown to be immaterial, the restriction of U to U{N)/T is not necessary, and Q is distributed as 
UAU H , where the distribution of A is given by eq. (32) and U is Haar distributed (uniform on U(N)). 
Furthermore, note that eq. (32) is a particular case of the density of the eigenvalues of a complex Wishart 
matrix [14, 15]. We recall that the complex N x N Wishart matrix with K degrees of freedom and co- 
variance £ (denoted by Wn(K, £)) is the matrix A = BB ff where B is a N x K matrix whose columns 
are complex circularly-symmetric independent Gaussian vectors with covariance S. Indeed, eq. (32) de- 
scribes the unordered eigenvalue density of a Wn(N, ^pljv) matrix. Taking into account the isotropic 
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property of the distribution of U, we can conclude that Q itself is also a Wn(N, j^-In) Wishart matrix. 
A similar result, with a slightly different constraint, was obtained by Adhikari in [16], where it is shown 
that the entropy-maximizing distribution of a positive definite matrix with known mean G follows a 
Wishart distribution with N + 1 degrees of freedom, more precisely the Wn(N + 1, jf^f) distribution. 

The isotropic property of the obtained Wishart distribution (due to the fact that U is Haar distributed, 
i.e. there is not privileged direction for the eigenvalues of the covariance matrix Q), is a consequence 
of the fact that no spatial constraints were imposed on the correlation. The energy constraint (imposed 
through the trace) only affects the distribution of the eigenvalues of Q. Note also that the generation for 
simulation purposes of Q according to the Wishart distribution obtained above is easy, since it can be 
obtained as Q = -^BB H , where B is a N x N matrix with i.i.d. complex circularly-symmetric Gaussian 
coefficients of unit variance. 



4.2.2 Application to the Kronecker channel model 

We highlight the fact that the result of Section 4.2 is directly applicable to the case where the channel 
correlation is known to be separable between transmitter and receiver. In this case [17], the full correlation 
matrix Q is known to be the Kronecker product of the transmit and receive correlation matrices, i.e. 
Q = Qt ® Qfl, where Qt and Q# are respectively the transmit and receive correlation matrices. This 
channel model is therefore denoted by "Kronecker model" , see [18] for an overview of its applicability. 
The stochastic nature of Qt and Q« is barely mentioned in the literature, since the correlation matrices 
are usually assumed to be measurable quantities associated to a particular antenna array shape and 
propagation environment. However, in situations where these are not known (for instance, if the array 
shape is not known at the time of the channel code design, or if the properties of the scattering environment 
can not be determined), but the Kronecker model is assumed to hold, our analysis suggests that the 
maximum entropy choice for the distribution of Qt and is independent, complex Wishart distributions 
with respectively nt and n r degrees of freedom. 



4.2.3 Marginalization over Q 

The complete distribution of the correlated channel can be obtained by marginalizing out Q, using its 
distribution as established in Section 4.2.1. The distribution of H is obtained through 



P H (H)= f P H | Q (H,Q)P Q (Q)dQ= / P H | Q (H,U,A)P A (A)dUdA 

JS JU(N)xR+ N 



(33) 



/(JV)xR+ J 

Let us rewrite the conditional probability density of eq. (16) as 

P , (h TT A"l — 1 -h H UA-'U fl h _ 1 -tr(hh g UA~ 1 U H ) (oa\ 

PH l Q(h ' U ' A) -^det(A) e -^det(A) L ' (34) 

Using this expression in (33), we obtain 



P H (H) = -L / / c -tr(hh»UA-u») dTJ det (A)- 1 P;(A)dA. 

JR+ N Ju(N) 



(35) 



Following the notations of [19], let det(/(z,j)) denote the determinant of a matrix with the (i,j)-th 
element given by an arbitrary function f(i,j). Also, let A(X) denote the Vandermonde determinant of 
the eigenvalues ir, of the matrix X 

A(X)=det(xr 1 ) = n^-^)- (36) 
Using these notations, let us recall the Harish-Chandra-Itzykson-Zuber (HCIZ) integral [20] 

jL »-<*' AUBU "» du - ( n " ! ) «™ <N - 1,/2 ?(XjSf <37) 
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where A and B are any hermitian matrices with respective eigenvalues A\, . . . , An and Si, ... , -Bat. 
Let us explicit the Haar integral in (35) using the Harish-Chandra-Itzykson-Zuber result by identifying 
A = hh H and B = A -1 . Note however that we can not directly apply the result in (37) since A is rank 
one, and therefore det A = 0. This can be resolved by taking the limit of all other eigenvalues to zero 
one by one, and applying the l'Hospital rule. Therefore, let A be an Hermitian matrix which has its 
iVth eigenvalue A N equal to h H h, and the others Ai, . . . , A N _i are arbitrary, positive values that will 
eventually be set to 0. Letting 

I{H,A 1 ,...,A N _ 1 ) = ± [ / e- tr ( AUA " uff )F u dUdct(A)- 1 P;(A)dA, (38) 

T Jm.+ N Ju(N) 

Ph (H) can be determined as the limit distribution when the first N — 1 eigenvalues of A go to zero: 

P H (H)= lim I(H, Ai, . . . , Ajv-i). (39) 

A 1 ,...,An-i—>0 

Applying the HCIZ to integrate over U yields 

/(H,^,...,^) = LJ_ \jj nlj y R+w A(A)A(A -/) det(A)-^ A (A)dA (40) 

1 f"- 1 \ r dct(e-^/ A 3 )dct(A) Ar - 2 , 

= AU*)L w(A) p - (A)dA (4i) 

where we used the identity A(A" X ) = dct(^- J " 1 ) = (-l)" (w+3)/2 dct fff2 =r - 

Then, let us decompose the determinant product using the expansion formula: for an arbitrary N x N 
matrix X = (Xjj), 

AT W 

det(x) = j2 Mr n x ^n = m e (-i) 3 ^ n ( 43 ) 

aGPjv n=l ' a.bG-Pjv n=l 

where a = [ai, . . . , ajv], 'Pjv denotes the set of all permutations of [1, . . . , N], and (— l) a is the sign of the 
permutation. Using the first form of the expansion, we obtain 

A(A) det (e~ Az/Xj ^J = det^'" 1 ) det{e~ A ^ Xi ) (44) 

= f e M) a n ( e (-!) b n e- A ^\ (45) 

\ae-Pjv n=l / VbG-Pjv m=l / 

= e (-i) a+b n A ™"" ie_A6 " /A "- ( 4e ) 

a.be-P^, n=l 

Note that in (44) we used the invariance of the second determinant by transposition in order to simplify 
subsequent derivations. Therefore, 

'N-l \ . N 



kh,^,...,^) = - C- n„i / e (-i) a+b n^ + -- 3 e-^M„ e -4A M ^ 7) 

V ^ \„=1 / a.be-Pjv n=l 

/JV-1 \ AT „ 

= ^atat uH EH) ,+b n/ + ^- ,e - vv *^) 

^ A ^ \n=l J a ,heV N n=l jR+ 

A(A) ' 



vn=l 
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where we let fi(x) = J R+ t N+l - 3 c- x ^ t e~^ t dt, and recognize the second form of the determinant ex- 
pansion (cq. (43)). In order to obtain the limit as A\, . . . A N _i go to zero, we use a result from [19, 
Appendix III] , which states that the limit of the ratio dct ^^)) f the singular determinants as the first 
p eigenvalues go to x is 



lim 

Xl,X2,...,X p — >XQ 



det(/,(x 3 )) 
A(X) 



det 



fi{xo); fi(xo); f- P 1 \x );ft(x p+1 ); f t {x N ) 



A(x p+ i , ...,X N ) UZp+l ( x i - x o) p TVj=l J- 



p- 1 ,1 



(50) 



where the first p columns in the right-hand side determinant represent the successive derivatives of the 
function /, and the rows correspond to different values of i = 1, . . . , N. Applying this - with p = N — 1 
and xq — since A has only one non-zero eigenvalue - yields 



P H (H) = lim J(H, Ai, . . . , Ajv-i) 

Ai,A 2 ,...,A N _i— >0 



(-7) 



TV 2 

— J] [nKn-l^det / 4 (0); //(0); . . . ; /f - 2) (0); /,(x,v) 



AT n=l 



(51) 
(52) 



At this point, it becomes obvious from (52) that the probability of H depends only on its norm (recall 
that xn = h H h by definition of A). The distribution of h is isotropic, and is completely determined by 
the probability density function P x (x) of having h s.t. h ff h = x. 

Therefore, for a given x, h is uniformly distributed over S*" 1 ^) = {hs.t.h^h = x}, the zero-centered 
complex hypersphere of radius x. Its volume is Vn{x) — , and its surface is Sn{x) = dv ^^ = 

N N — l 

n (N-i)i ' Therefore, we can write the probability density function of x^ as 



Px(x) 



P H (h)dh = LJl 



- [] [n\(n I)!]" 1 dct [/.(0); #(0); . . . ; /f - 2) (0); f t (x)\ . (53) 



n=l 



In order to simplify the expression of the successive derivatives of fi , it is useful to identify the Bessel 
A-function [21, Section 8.432], and to replace it by its infinite sum expansion [21, Section 8.446] 



fi(x) - 2 



i+N-2 



= (-7) 



-i-JV+2 



E " □ L {-ix) k + 



(54) 



fc=0 



fc! 



\i+N-i I 7 X J 



k\{i + N-2 + k)\ 



(ln(-7a;) - ip(k + 1) - ip(i + N - 1 + fc)) 



fc=0 



(55) 



Note that there is only on term in the sum with a non-zero pth derivative at 0. Therefore, the pth 
derivative of /, at is simply (for < p < N — 2) 



4 P) (0) = {-l)- l - N i p - l - N+2 {i + N-3-p)\ 



(56) 



Let us bring the last column to become the first, and expand the resulting determinant along its first 
column: 



det 



/f (0); . . . ; /f - 2) (0); /<(*)] = (-1)"" 1 det [/,(*); /f> (0); . . . ; ^ N ~ 2 \o) 



(57) 



N 



n=l 



where f- p ^(0) is the A — 1 dimensional column obtained by removing the nth clement from f- p> (0). 



Factoring the (— 1) P 7 P 1 N+2 in the expression of fi P \o) out of the determinant yields 



det 



/7>);..-;/7,n- a) (°) = (-i^^vv-^^dettgW) 



(59) 
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where the N — 1 dimensional matrix g(") has the elements 



ejg = n q ™ + N-k-i) 



where 



(«) _ / 



9; = 



Z < n- 1, 



Z + l, Z > n. 



(60) 



(61) 



Using the fact that T(qf> + i) = qf>Y{qf> + + l)T(qf> + note that the fcth column of 

g («) i s 

g$ = +N-k-2) + (N-k- 2)gg +1 - (62) 

Since the second term is proportional to the (k + l)th column, it can be omitted without changing the 
value of the determinant. Applying this property to the first TV — 2 pairs of consecutive columns, and 
repeating this process again to the first N — 2, . . . , 1 pairs of columns, we obtain 



det(gW) = dct (r(«W + N 2); ... ; r(g<"> + 2); r(g<"> + 1); r(«< n >)) 

= det («< n) r(«< n > + N 3); ... ; «< n) r(gW + 1); g< n) r(g<">); r(«<">; 
= det («W a r(flf B) + ^ - 4); • • • ; ^W);^);^)) 



= det ( ? w JV - 1 -*r(gW; 



njk£(!) 

r(n) 
r(n) 



det $ 



» 



JV-l-fc 



( _ 1)( iV-l)(iV-2)/2 det / („) 



fc-1 



(63) 
(64) 
(65) 
(66) 

(67) 

(68) 



where the last two equalities are obtained respectively by factoring out the r(gj™ ') factors (common to 
all terms on the Zth row) and inverting the order of the columns in order to get a proper Vandermondc 
structure. Finally, the determinant can be computed using (36), as 



detL^ 1 ) = n (■ 

' l<j<i<N-l 



(n) (n) 



n (i-n 

K l<j<i<n-1 
1-2 JV-1 



= II II 



t! 



n (i+i-i) 

l<j<n<i<AT-l 
JV-1 



n (*-•?) 

i n<j<i<N-l 



n ( < -») , =7^ 



nJV-l ., 
i=l l! 



11 ■ 11 (i - n + 1)! 1 

2—1 2— n i—n+1 

Wrapping up the above derivations, one obtains successively 



(n- l)!(iV-n)!' 



(69) 
(70) 
(71) 



det 

Finally, we obtain 



det(g<">) 

/^(0);...;^- 2) (0) 

/^(O);...;/^^);/^) 



'JV-1 



l[ i\) (-1)^-1)^- 



2)/2. 



[(n-l)ir(JV-n)!' 



dct 



'•N-l \ /_ 1 N„+l / y Tl _JV 2 +iV-l 

[(n- l)!] 2 (iV-n)! 

2 



n 

i=l 



JV 



'JV-1 



,,n-JV J +JV-l 



n=l 



. i=l 



[(n-l)!] 2 (jV-n)! 



JV 



„JV+n-l 



n=l 



[(n- l)!] 2 (iV-n)!' 



(72) 
(73) 
(74) 

(75) 
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where 7 = — 

The corresponding PDF is shown in Figure 2(a), as well as the PDF of the instantaneous power of a 
Gaussian i.i.d. channel of the same size and mean power. As expected, the energy distribution of the 
proposed model is more spread out than the energy of a Gaussian i.i.d. channel. 

Figure 2(b) shows the CDF curves of the instantaneous mutual information achieved over the channel 
described in eq. (1) by these two channel models. The proposed model differs in particular in the tails of 
the distribution: for instance, the 1% outage capacity is reduced from 4.5 to 3.9 nats w.r.t. the Gaussian 
i.i.d. model. 




12 



(a) PDF of x = ||H||J, (b) CDF of instantaneous mutual information / 



Figure 2: Amplitude and mutual information distributions of the proposed channel models for a 4 x 4 
antennas setting. 



4.3 Limited-rank covariance matrix 

In this section, we address the situation where the modeler takes into account the existence of a co- 
variance matrix of rank L < N (we assume that L is known). As in the full-rank case, we will use 
the eigendecomposition Q = UAXJ H of the covariance matrix, with A = diag(Ai, . . . , Al, 0, . . . , 0). Let 
us denote = diag(Ai, . . . , Xl)- The maximum entropy probability density of Q with the extra rank 
constraint is unsurprisingly similar to the one derived in Section 4.2.1, with the difference that all the 
energy is carried by the first L eigenvalues, i.e. U is uniformly distributed over U(N), while 

L 2 L / \ 

P ^=(m) n^iji«p b4 £ A * n c*-a,)». (^) 

v u/ n=l v ' \ u i=l...£ / i<j<L 

However, the definition of the conditional probability density PhiqC 1 , U,A) in eq. (16) does not hold 
when Q is not full rank: h becomes a degenerate Gaussian random variable. Its projection in the L- 
dimensional subspace associated to the non-zero eigenvalues of Q follows a Gaussian law, whereas the 
probability of h being outside of this subspace is zero. The conditional probability in eq. (34) must 
therefore be rewritten as 

P H | Q (h, U, A L ) = l {heSpan(U[L])} * e-h^A^u^h (?7) 

where Urn denotes the N x L matrix obtained by truncating the last N — L columns of U. The indicator 
function (I.4 = 1 if statement A is true, else) ensures that PH|Q(h, U, A) is zero for h outside of the 
column span of Urj,]. 
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We need now to marginalize U and A in order to obtain the PDF of h: 

Pn(h)= [ P H | Q (h,U,A i )P AL (A L )dUdA L . (78) 

JU(N)xM+ L 

However, the expression of Pn\Q(h, U, A^) does not lend itself directly to the marginalization described 
in Section 4.2.3, since the zero eigenvalues of Q complicate the analysis. However, this can be avoided 
by performing the marginalization of the covariance in an L-dimensional subspace. In order to see this, 

( B 

consider an L x L unitary matrix B/,, and note that the N x N block- matrix B = 



In-l 

is unitary as well. Since the uniform distribution over U(N) is unitarily invariant, UB is uniformly 
distributed over U (N) , and for any B^, £ U (L) we have 



^H(h) = I 



PH|Q(h,UB,A L )P Ai (A L )dUdA L . (79) 

(JV)xR+- L 

Furthermore, since J u ^dB L = 1, 

Pn(h) =11 P H | Q (h,UB,A L )P Ai (A L )dUdA L dB L (80) 

JU(L) JU(N)xM+ L 

= I l { h eS pan( U[[l)} / * - e- hHu ^ B ^ A ^ B " u ^ gh P AL (A L )dB L dA J (ai) 

JUGU(N) JU(L)xR+ L TT h H i=1 Aj 

= / l {heSpail ( U[1])} fl t (U [L] ff h)dU ) (82) 

where (82) is obtained by letting k = U^j^h and 

/w(L)xR+ L TT L ]jf =1 A 



p k(k)-/ T ^ L ; c- k B ^ B ^ k P A JA L )dB L dA L . (83) 



We can then exploit the similarity of eqs. (83) and (35), and, by the same reasoning as in Section 4.2.3, 
conclude directly that k is isotropically distributed in U(L), and that its PDF depends only on its 
Frobcnius norm, following 

ft « - si(ky^ , < kHk »- <84 » 



where 

-it (-VS) W ^ ( 2L ^£) [(i-DftL-o. - (85) 

Finally, note that h H h = k^k, and that the marginalization over the random rotation that transforms 
k into h in eq. (82) preserves the isotropic property of the distribution. Therefore, 



Examples of the corresponding PDFs for L = 1,2,4,8, 12 and 16 are represented on Fig. 3 for a 4 x 4 
channel (N = 16), together with the PDF of the instantaneous power of a Gaussian i.i.d. channel of the 
same size and mean power. As expected, the energy distribution of the proposed MaxEnt model is more 
spread out than the energy of a Gaussian i.i.d. channel. 

The CDF of the mutual information achieved over the limited-rank (L < 16) and full rank (L = 16) 
covariance MaxEnt channel at a SNR of 15 dB is pictured on Figure 4 for various ranks L, together with 
the Gaussian i.i.d. channel. The proposed model differs in particular in the tails of the distribution. In 
particular, the outage capacity for low outage probability is greatly reduced w.r.t. the Gaussian i.i.d. 
channel model. 
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5 Conclusion 

In this paper, the maximum entropy principle is used to derive several models of wireless flat-fading 
channels for various cases of a priori knowledge about the channel properties. First, the cases of average 
channel energy and known upper-bound on the channel energy were studied. Subsequently, the issue of 
taking into account an unknown amount of spatial correlation in MIMO channel models was addressed. 
The entropy maximizing distribution of the covariance matrix under a average trace constraint was shown 
to be a Wishart distribution, and the corresponding probability density function of the channel matrix 
was shown to be described analytically by a function of the channel Frobenius norm. This model was 
generalized to the case where the covariance matrix is rank-deficient with known rank. The proposed 
channel models were compared to the commonly used Gaussian i.i.d. models in terms of the statistics 
of the achieved mutual information for a given noise level. The proposed models exhibit slightly lower 
average mutual information, in line with the rule that channel correlation decreases its capacity, and a 
higher variance than the Gaussian i.i.d. model, which reflects the presence of shadow fading. 
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